Using Text Clustering for Intelligence Classification
نویسندگان
چکیده
In this paper, we discuss how text mining methods could be used in a mixedinitiative interaction approach to intelligence analysis. We describe how simple methods from text mining can be used to help intelligence analysts determine where a specific report or analysis fits into the knowledge base (KB), i.e., how it should be classified and which, if any, other documents in the KB it should be linked to. The method works by comparing the vector space model representation of the new information document with those of all documents previously stored in the knowledge base. Those documents that are sufficiently similar to the new piece of information are displayed to the user, who can then choose to place links between them. Using a computer tool such as the one suggested here potentially allows the analyst to spend more time analyzing intelligence reports rather than searching for and classifying them. In previous work, we have discussed how the MilWiki, an improved implementation of the open-source MediaWiki system, could be used as a knowledge base for military purposes. To illustrate the text classification method described in this paper, it has been implemented for MilWiki. To simulate new pieces of information, the prototype allows the user to download articles from the Wikipedia. The method, as well as the collaborative work process used in a wiki, could be implemented in any content management systems. In addition to describing the text classification method, we also give a brief introduction to text mining and the vector space model of documents.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملUsing Swarm Intelligence Techniques in Document Management Systems
In the field of economics and business, the ever increasing amount of text documents written in different languages and the ever increasing dependence of people and organisations on such information require effective document retrieval, searching and classification mechanisms. Searching for groups of related documents has an important role in text mining and Document Management Systems. Swarm i...
متن کاملOptimization and Application of OPTICS Algorithm on Text Clustering
Text clustering is of great importance in data mining, information fusion, artificial intelligence and some other fields. There are many methods in literatures that can be used to classify text. Most of them require some parameters, such as the number of categories, which should be assigned in advance or estimated in classifying process. However, it is difficult to determine these quantities in...
متن کاملRobust Method for E-Maximization and Hierarchical Clustering of Image Classification
We developed a new semi-supervised EM-like algorithm that is given the set of objects present in eachtraining image, but does not know which regions correspond to which objects. We have tested thealgorithm on a dataset of 860 hand-labeled color images using only color and texture features, and theresults show that our EM variant is able to break the symmetry in the initial solution. We compared...
متن کامل